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ADAPTIVE TV PROGRAM RECOMMENDER 

BACKGROUND OF THE INVENTION 
Field Of The Invention 

[0001] The invention relates to recommending television shows 
based on a user profile. 

5 

Description Of The Related Art 

[0002] U.S. Patent 5,758,259 shows a method for identifying a 
preferred television program based on a "correlation" between the 
program and predetermined characteristics of a user profile. The 
10 term "correlation" as used in the patent does not appear to relate 
to the mathematical concept of correlation, but rather, is a very 
simple algorithm for assessing some similarity between a profile 
and a program. 

15 SUMMARY OF THE INVENTION 

[0003] It is an object of the invention to improve techniques of 
automatic program recommendation. 

[0004] This object is achieved by using a probabilistic 
calculation, based on a viewer profile. The probabilistic 
20 calculation is preferably based on Bayesian classifier theory. 

[0005] The object is further achieved by maintaining a local 
record of a viewer history. The local record is preferably 
incrementally updatable. The local record is advantageous for 
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privacy reasons, and can be contrasted with methods, such as 
collaborative filtering, which would require viewer history 
information to be uploaded to a central location. The use of 
incremental updates is advantageous in minimizing storage 
5 requirements. 

[0006] It is a still further object of the invention to improve 
the classical Bayesian classifier technique. 

[0007] In one embodiment, this object is achieved by noise 
filtering. 

10 [0008] In another embodiment, this object is achieved by 
applying a modified Bayesian classifier technique to non- 
independent feature values. 

[0009] Further objects and advantages of the invention will be 
described in the following. 
15 [0010] Bayesian classifiers are discussed, in general, in the 
textbook of Duda & Hart, "Pattern Recognition and Scene Analysis" 
(John Wiley & Sons 1973) . An application of Bayesian classifiers to 
document retrieval is discussed in "Learning Probabilistic User 
Models", by D. Billsus & M. Pazzani. 



20 



BRIEF DESCRIPTION OF THE DRAWING 



[0011] 



The invention will now be described by way of non- 



limiting example with reference to the following drawings, in 



which: 



25 



[0012] 



Fig, 1 shows a system on which the invention may be used; 
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[0013] 


Fig. 


2 shows major elements of an adaptive recommender; 


[0014] 


Fia . 


3 shows pseudo code for a viewing history generator; 


[0015] 


Fig. 


4 shows a table of key fields; 


[0016] 


Fig. 


5 shows a viewer profile; 


[0017] 


Fig. 


6a shows a prior probability calculation; 


[0018] 


Fig. 


6b shows a conditional probability calculation; and 


[0019] 


Fig. 


6c shows a posterior probability calculation. 



DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 
10 [0020] Fig 1 illustrates hardware for implementing the 
invention. The hardware includes a display 1, some type of 
processor 2, some type of user entry device 4 connected to the 
processor 2 via some type of connection 3, and some type of link 5 
for receiving data, such as television programming or Electronic 
15 Programming Guide ("EPG") data. The display 1 is commonly a 

television screen, but could be any other type of display device. 
The processor 2 may be a set -top box, a PC, or any other type of 
data processing device, so long as it has sufficient processing 
power. The user entry device 4 may be a remote control unit and the 
20 connection 3 may be an infrared connection. If the processor is a 
PC, the user entry device will commonly be at least plural, e.g., a 
keyboard and mouse. The user entry device may also be touch 
sensitivity on the display. The connection 5 to the outside world 
could be an antenna, cable, a phone line to the Internet, a network 
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connection, or any other data link. Equally well, connection 5 
could connect to a memory device or several memory devices . 
[0021] Fig. 2 illustrates major elements of an embodiment of an 
adaptive recommender. These elements preferably reside as software 
5 and data in a medium 110 readable by a data processing device, such 
as CPU 2. The elements include a viewing history data structure 101 
that gives input to profiler software 102. The profiler software, 
in turn, produces the viewer profile 103, The terms "user profile" 
and "viewer profile" shall be used interchangeably herein. The 

10 viewer profile serves as an input to recommender software 104, The 
recommender software also uses, as an input, the EPG data structure 
105, that contains features describing each show, such as title, 
channel, start time and the like. An output of the recommender 104 
appears on a user interface 106 where a user can interact with it. 

15 [0022] The viewer history data structure includes selected 
records from the EPG database. EPG databases are commercially 
available, for instance, from Tribune Media Services. Those of 
ordinary skill in the art may devise other formats, possibly with 
finer shades of description. The selected records minimally 

20 correspond to TV shows watched by the viewer. It is assumed that 
these records have been deposited in the viewing history data 
structure 101 by software that is part of the user interface and 
knows what shows the viewer has viewed. Preferably, the software 
allows recording of a user watching more than one show in a given 

25 time interval, as users often switch back and forth during 
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commercials, and so forth. Preferably, also, the software records a 
program as watched regardless of how long it was watched; and if a 
show was watched or whether it was taped for later viewing. 
[0023] The preferred viewing history format assumes the presence 
5 of both positive and negative records in the viewing history. This 
is needed because the goal is to learn to differentiate between the 
features of shows that are liked and those not liked. Fig. 3 shows 
pseudo code for collecting the viewing history. 

[0024] Let the notation C+ denote the set of positive (i.e., 
10 watched) shows and C- denote the negative (i.e., not watched) 
shows . 

[0025] The viewer profile includes a number of feature value 
counts. These counts are incremented whenever new entries are 
deposited in the viewer history. Usually, each program has several 
15 feature values. Accordingly, the deposit of a program in the viewer 
history causes the update of counts associated with all feature 
values associated with that program. 

[0026] The incremental updatability of this type of history is 
advantageous because it allows for ongoing adaptation of the viewer 
20 history without a large amount of storage or computing effort being 
required. 

[0027] In addition to the count of the nximber of positive and 
negative entries {k(C+), k(C-)), a count of occurrences of 
individual features is also kept among the positive and negative 
25 examples (k{fi|C+), k(fi|C-)) where fi denotes feature i and 
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k{fi|C+) denotes the number of shows in set C+ that possess feature 
fi. The feature set includes entries in the EPG records extracted 
from selected key fields, an example of which is shown in Table 1 
which is Fig. 4. 

5 [0028] A partial example of an embodiment of such counts is 
presented in Table 2, shown in Fig. 5 to illustrate the idea. The 
list as presented in Fig. 4 has six columns to save space, but, in 
fact, the list has only three columns, with the bottom part of 
Table 2 being presented next to the top part. Each row of Table 2 

10 has four pieces of data, i.e., a feature type and a feature value 
in the first column, a positive count in the second column, and a 
negative count in the third column. The positive count indicates 
the number of times a program having that feature value has been 
watched. The negative count indicates the number of times a program 

15 having that feature value has not been watched. 

[0029] A television program schedule normally includes several, 
if not many, programs for every time slot in every day. Normally, 
the user will only watch one or two of the programs in any given 
time slot. If the viewer profile contains a list of ALL the 

20 programs not watched, the number of programs not watched will far 
exceed the number of programs watched. It may be desirable to 
create a method for sampling the programs not watched. For 
instance, as the processor assembles the viewer profile, the 
processor may chose a single not -watched program at random from the 

25 weekly schedule as a companion for each watched program, as 
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suggested in the pseudo-code of Fig. 3. This design will attempt to 
keep the number of positive and negative entries in the viewing 
history about equal so as not to unbalance the Bayesian prior 
probability estimates, discussed below. 
5 [0030] It is not generally desirable to choose a companion 
program from the same time slot as a watched program. Experiment 
has shown that the combined time and day feature value is typically 
the strongest or one of the strongest predictors of whether a 
particular program will be preferred. Thus, another program at the 

10 same time as the watched program may well be a second or third 

choice program, while a program at a totally different time may be 
very undesirable. Accordingly, it is preferred to choose the 
companion program at random from the program schedule of the entire 
week that includes the watched program. 

15 [0031] Since time and day feature values for a program are often 
so important in determining whether a program will be of interest 
to a user, it is typically undesirable to consider two programs of 
identical content to be the same if they are shown on different 
days and/or at different times. In other words, a particular 

20 episode of a series may be strongly preferred if it is shown at 8 
p.m. on Tuesday, while the same episode of the same series may be 
completely undesirable if it is shown at 10 a.m. on Monday. Thus, 
the episode at 10 a.m. should be considered a different program 
from the episode at 8 p.m., even though the content of the two are 

25 identical. 
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[0032] As more and more shows are viewed, the length of the 
profile will tend to grow larger and larger. To combat this, and to 
keep the focus on features that are effective discriminators, the 
following are recommended: 
5 - periodic reviews of the features in the viewer profile, 

and 

removal of words that appear to be frequent and not very- 
discriminating , 

[0033] In general, those of simple tastes, e.g., those who only 
10 like to watch football, will be fairly easy to recommend for after 
taking of a viewer profile for a relatively short time. For those 
of more complex preferences, it will take longer for the viewer 
history to be sufficiently meaningful to make good recommendations. 
These latter people, however, are those who are probably most in 
15 need of a recommendation. 

[0034] In the final analysis, viewer histories will always be 
ambiguous . Recommendations of shows based on such histories will 
always contain a margin for error. The recommendations can, at 
best, be said to have some probability of being correct. Therefore, 
20 probabilistic calculations are useful in analyzing viewer profile 
data to make recommendations , 

[0035] The preferred embodiment of the recommender uses a simple 
Bayesian classifier using prior and conditional probability 
estimates derived from the viewer profile. How recommendations are 
25 shown to viewers is not defined here, yet it will be assumed that 
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one can capture the viewer's response to them, at least observing 
whether or not they were watched. 

[0036] Below, a 2-class Bayesian decision model is discussed. 
The two classes of TV shows of interest are: 

5 

CI - shows that interest the viewer 

C2 - shows that do not interest the viewer 

Other classes might be used showing more shades of interest or lack 
10 thereof . 

[0037] In contrast with the classes of interest listed above, 
viewing history obtains information only on the classes: 

15 C+ - shows the viewer watched 

C- - shows the viewer did not watch 

[0038] Determining which shows a user watched or did not watch 
is outside of the scope of this application. The user might enter a 

20 manual log of which shows he/she watched. Alternatively, hardware 
might record the user's watching behavior. Those of ordinary skill 
in the art might devise numerous techniques for this. It should be 
possible to consider shows as watched even if they are watched only 
for a short time, as a user may be switching back and forth between 

25 several shows, trying to keep track of all of them. 
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[0039] Inferences may be made about classes CI and C2 based on 
observations, but these inferences will always contain an element 
of uncertainty. The Bayesian model will compute the prior 
probabilities P(C+) and P(C-) directly from the counts in the 
5 viewer profile in accordance with Fig. 6a. In other words, the 
assumption will be that shows not watched are those the viewer is 
not interested in, and that the shows watched are the ones that the 
viewer is interested in. 

[0040] The conditional probabilities, that a given feature, fi, 
10 will be present if a show is in class C+ or C-, are then computed 
in accordance with Fig. 6b. These calculations can be performed 
once a day during times that the TV is not being viewed and stored 
in the viewer profile. 

[0041] Recommendations for upcoming shows can be computed by 
15 estimating the posterior probabilities, i.e., the probability that 
a show is in class C+ and C- given its features. Let x be a binary 
vector (xl, x2, ...,xi, xn) where i indexes over the features in 
the viewer profile, and where xi = 1 if feature fi is present in 
the show being considered for recommendation, and 0 if not. For the 
20 exclusive features, like day, time, and station, where every show 
must have one and only one feature, the index i will be taken to 
indicate the value present in the show being considered, provided 
that this value is also present in the profile. Otherwise, novel 
exclusive features will not enter into the calculations. For non- 
25 exclusive features, the index i will range over all values present 
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in the profile; non-exclusive features novel to the considered show 
will not contribute to the calculations. The posterior 
probabilities are estimated in accordance with Fig. 6c 
[0042] With these estimates in hand, a show will generally be 
5 recommended if P(C+|x) > P(C-|x) and the "strength" of the 

recommendation will be proportional to P(C+|x) - P(C-|x). One 
potential problem with this scheme is that some conditional 
probabilities are likely to be zero. Any zero in a chain 
multiplication will reduce the result to zero; hence, some means 

10 for eliminating zeros is needed. The Billsus and Pazzani article 
referenced above, presents a couple of schemes, including simply 
inserting a small constant for any zero that occur. 
[0043] One method for dealing with zeroes in the conditional 
probability multiplication chain would be as follows. One can 

15 choose a heuristic of 1000. If the number of shows in the viewing 
history is less than 1000, then the value of l/lOOO can be 
substituted for zero. If the number of shows in the viewing history 
is greater than 1000, the correction can be 

k.+ +1 
k+ +2 

ki+ is the number of watched shows having feature I 
k:+ is the total number of watched shows. 



20 

Where 
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This is what is called the Laplace correction in the Billsus and 
Pazzani article. This Laplace correction must also be done for the 
not watched shows. 

[0044] Alternative schemes may be devised by those of ordinary 
5 skill in the art . 

[0045] Classical Bayesian theory would require the use of all 
accumulated elements of the list of Fig. 5 in making a 
recommendation. Nevertheless, in some instances, it may be useful 
to use a noise cutoff, eliminating features from consideration if 

10 insufficient data about them appears in the list. For instance, if 
a particularly feature did not appear in more than some given 
percentage of shows considered, whether in negative or in positive 
count, it might be ignored in determining which recommendation to 
make. Experimentally it was found that a cutoff of 5% was far too 

15 large. 

[0046] Rather than use a percentage, one embodiment of the noise 
cutoff would use the viewer profile itself to determine the cutoff. 
This embodiment would first take a subset, or sub-list, of the 
viewer profile relating to particular feature types. For instance, 

20 a sub-list might advantageously comprise all of the elements of the 
viewer profile relating to the feature types, i.e., time of day and 
day of the week. Alternatively, in another example, the sub- list 
might advantageously comprise all of the elements of the viewer 
profile relating to channel nximber. Generally, the feature type or 

25 types chosen should be independent feature types, in other words. 
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feature types which do not require another feature type to be 
meaningful . 

[0047] The sub-list is then sorted by negative count, i.e., by 
number of shows having a particular feature value and not being 
5 watched. The highest negative count in this sorted list can be 

viewed as the noise level. In other words, since, in the preferred 
embodiment, the "not watched" shows are chosen at random from the 
week's program schedule, any not watched time slot can be 
considered to be noise . 

10 [0048] Thus, any feature having both a positive and a negative 
count at or below the noise level need not be considered in the 
Bayesian calculation in making a recommendation. This example of 
noise level thresholding uses a particular feature, e.g., day/time 
as one for determining noise cutoff. In general, any feature that 

15 is uniformly randomly sampled by the negative example sampling 

procedure may be chosen by those of ordinary skill in the art for 
the calculation of the noise threshold. 

[0049] The calculations of Figs. 6a-6c are advantageous in that 
they require fairly low computing power to complete and are 
20 therefore readily adaptable to modest hardware, such as would be 
found in a set-top box. 

"Surprise Me" Feature 

[0050] Recommendations according to the above -described scheme 
25 will be programs having a preponderance of features that are 
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present in shows that have been watched. The viewer profiles 
accumulated will not yield any meaningful recommendations with 
respect to shows having few features in common with watched shows . 
Accordingly, optionally, the recommender may occasionally recommend 
5 shows at random, in a "surprise me" feature, if such programs have 
relatively few features in common with watched shows. 

Using The User Profile In Other Domains 

[0051] Once a user profile is developed, the recommendation 
10 techniques of the invention might be used to recommend other types 
of items, such as movies, books, audio recordings, or even 
promotional materials, such as tee-shirts or posters. 

Non- Independence Of Features 

15 [0052] The classical assumption in the domain of Bayesian 

classifier theory is that all features are independent. Therefore, 
if a features is, say, often present in positive shows, but is 
missing from a show being considered for recommendation, the fact 
should count against the show. However, this may yield undesirable 

2 0 results for the current application. 

[0053] For example, let us assiime that there are five day/time 
slots indicated in the user profile as being most watched. Let us 
assume further that a particular show being evaluated falls within 
one of those five slots. The calculation of Fig. 6c would then give 

25 rise to an increase in probability for the day/time slot that 
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matches and a decrease for the four day/time slots that do not 
match. Intuitively, it appears that the latter decrease is not 
reasonably related to an accurate determination of probability for 
the show in question. The different values of day/time are not 
5 independent -- as every show has one and only one value, so the 
values a show does not have should not count against it . 
[0054] To remedy this deficiency in the classical Bayesian 
approach, it is proposed to designate features into two types: Set 
1 and Set 2. If a feature is designated Set 1, the Bayesian 
10 calculation will ignore any non-matching values of the feature. If 
the feature is designated Set 2, then the normal Bayesian 
calculation, per Fig. 6c, will be done, 

[0055] Normally in a television application, Set 1 would include 
day/time; station; and title. Some features which have values only 
15 for a few shows, e.g., critic ratings, should also be set 1, 

because too many shows would be non-matching merely because critics 
tend to rate only a tiny percentage of shows . 

[0056] Set 2, for television shows, would normally include all 
features that can have several values per show, such as actor. 

20 From reading the present disclosure, other modifications will be 
apparent to persons skilled in the art. Such modifications may 
involve other features which are already known in the design, 
manufacture and use of television interfaces and which may be used 
instead of or in addition to features already described herein. 

25 Although claims have been formulated in this application to 
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particular combinations of features, it should be understood that 
the scope of the disclosure of the present application also 
includes any novel feature or novel combination of features 
disclosed herein either explicitly or implicitly or any 
5 generalization thereof, whether or not it mitigates any or all of 
the same technical problems as does the present invention. 
Applicants hereby give notice that new claims may be formulated to 
such features during the prosecution of the present application or 
any further application derived therefrom. 
10 [0057] The word "comprising", "comprise", or "comprises" as used 
herein should not be viewed as excluding additional elements. The 
singular article "a" or "an" as used herein should not be viewed as 
excluding a plurality of elements. 



S:\GO\SS17GOB0.GOR 



16 



